41 research outputs found
Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal
Medical images often contain artificial markers added by doctors, which can
negatively affect the accuracy of AI-based diagnosis. To address this issue and
recover the missing visual contents, inpainting techniques are highly needed.
However, existing inpainting methods require manual mask input, limiting their
application scenarios. In this paper, we introduce a novel blind inpainting
method that automatically completes visual contents without specifying masks
for target areas in an image. Our proposed model includes a mask-free
reconstruction network and an object-aware discriminator. The reconstruction
network consists of two branches that predict the corrupted regions with
artificial markers and simultaneously recover the missing visual contents. The
object-aware discriminator relies on the powerful recognition capabilities of
the dense object detector to ensure that the markers of reconstructed images
cannot be detected in any local regions. As a result, the reconstructed image
can be close to the clean one as much as possible. Our proposed method is
evaluated on different medical image datasets, covering multiple imaging
modalities such as ultrasound (US), magnetic resonance imaging (MRI), and
electron microscopy (EM), demonstrating that our method is effective and robust
against various unknown missing region patterns
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
When applying a pre-trained 2D-to-3D human pose lifting model to a target
unseen dataset, large performance degradation is commonly encountered due to
domain shift issues. We observe that the degradation is caused by two factors:
1) the large distribution gap over global positions of poses between the source
and target datasets due to variant camera parameters and settings, and 2) the
deficient diversity of local structures of poses in training. To this end, we
combine \textbf{global adaptation} and \textbf{local generalization} in
\textit{PoseDA}, a simple yet effective framework of unsupervised domain
adaptation for 3D human pose estimation. Specifically, global adaptation aims
to align global positions of poses from the source domain to the target domain
with a proposed global position alignment (GPA) module. And local
generalization is designed to enhance the diversity of 2D-3D pose mapping with
a local pose augmentation (LPA) module. These modules bring significant
performance improvement without introducing additional learnable parameters. In
addition, we propose local pose augmentation (LPA) to enhance the diversity of
3D poses following an adversarial training scheme consisting of 1) a
augmentation generator that generates the parameters of pre-defined pose
transformations and 2) an anchor discriminator to ensure the reality and
quality of the augmented data. Our approach can be applicable to almost all
2D-3D lifting models. \textit{PoseDA} achieves 61.3 mm of MPJPE on MPI-INF-3DHP
under a cross-dataset evaluation setup, improving upon the previous
state-of-the-art method by 10.2\%
A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision
Deep learning has the potential to revolutionize sports performance, with
applications ranging from perception and comprehension to decision. This paper
presents a comprehensive survey of deep learning in sports performance,
focusing on three main aspects: algorithms, datasets and virtual environments,
and challenges. Firstly, we discuss the hierarchical structure of deep learning
algorithms in sports performance which includes perception, comprehension and
decision while comparing their strengths and weaknesses. Secondly, we list
widely used existing datasets in sports and highlight their characteristics and
limitations. Finally, we summarize current challenges and point out future
trends of deep learning in sports. Our survey provides valuable reference
material for researchers interested in deep learning in sports applications
Experimental investigation on the flexural mechanical behaviour of an immersion joint
The immersed tunnelling technique is commonly used for river or sea crossings worldwide. Seismic safety criteria of immersed tunnels involve the shear stiffness, axial stiffness, flexural stiffness, and opening deformations of the immersion joints. Therefore, it is necessary to conduct the mechanical analysis of the joint between the immersed tunnel elements. An experi-ment of an immersion joint is presented in this paper, mainly dealing with the experiment design, axial behaviour and flexural behaviour of the immersion joint. The geometric scale of this experi-ment is 1:10. The model joint in this paper includes two 3.8m x 1.15m x 1.2m segments with a rubber gasket and horizontal steel shear keys between them. Different levels of water pressure were considered due to the significant changes of water depth in real project. The displacements of an immersion joint under multi-level loads were measured and analysed considering the hyper-elastic property of a GINA gasket. It can be found that the mechanical behaviour of a GINA gasket is significantly affected by both flexure and axial loadings. Moreover, the flexural stiffness ratio of the joint with respect to that of the tunnel element in service states ranges from 1/27 to 1/272. The results are useful for the further numerical analysis of immersion joint and more related publi-cations are expected in the future
DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
Image-based fashion design with AI techniques has attracted increasing
attention in recent years. We focus on a new fashion design task, where we aim
to transfer a reference appearance image onto a clothing image while preserving
the structure of the clothing image. It is a challenging task since there are
no reference images available for the newly designed output fashion images.
Although diffusion-based image translation or neural style transfer (NST) has
enabled flexible style transfer, it is often difficult to maintain the original
structure of the image realistically during the reverse diffusion, especially
when the referenced appearance image greatly differs from the common clothing
appearance. To tackle this issue, we present a novel diffusion model-based
unsupervised structure-aware transfer method to semantically generate new
clothes from a given clothing image and a reference appearance image. In
specific, we decouple the foreground clothing with automatically generated
semantic masks by conditioned labels. And the mask is further used as guidance
in the denoising process to preserve the structure information. Moreover, we
use the pre-trained vision Transformer (ViT) for both appearance and structure
guidance. Our experimental results show that the proposed method outperforms
state-of-the-art baseline models, generating more realistic images in the
fashion design task. Code and demo can be found at
https://github.com/Rem105-210/DiffFashion
Devil in the Number: Towards Robust Multi-modality Data Filter
In order to appropriately filter multi-modality data sets on a web-scale, it
becomes crucial to employ suitable filtering methods to boost performance and
reduce training costs. For instance, LAION papers employs the CLIP score filter
to select data with CLIP scores surpassing a certain threshold. On the other
hand, T-MARS achieves high-quality data filtering by detecting and masking text
within images and then filtering by CLIP score. Through analyzing the dataset,
we observe a significant proportion of redundant information, such as numbers,
present in the textual content. Our experiments on a subset of the data unveil
the profound impact of these redundant elements on the CLIP scores. A logical
approach would involve reevaluating the CLIP scores after eliminating these
influences. Experimentally, our text-based CLIP filter outperforms the
top-ranked method on the ``small scale" of DataComp (a data filtering
benchmark) on ImageNet distribution shifts, achieving a 3.6% performance
improvement. The results also demonstrate that our proposed text-masked filter
outperforms the original CLIP score filter when selecting the top 40% of the
data. The impact of numbers on CLIP and their handling provide valuable
insights for improving the effectiveness of CLIP training, including language
rewrite techniques.Comment: ICCV 2023 Workshop: TNGCV-DataCom
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Learning-based methods have dominated the 3D human pose estimation (HPE)
tasks with significantly better performance in most benchmarks than traditional
optimization-based methods. Nonetheless, 3D HPE in the wild is still the
biggest challenge of learning-based models, whether with 2D-3D lifting,
image-to-3D, or diffusion-based methods, since the trained networks implicitly
learn camera intrinsic parameters and domain-based 3D human pose distributions
and estimate poses by statistical average. On the other hand, the
optimization-based methods estimate results case-by-case, which can predict
more diverse and sophisticated human poses in the wild. By combining the
advantages of optimization-based and learning-based methods, we propose the
Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the
problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO
achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE mm
without training with any 2D-3D or image-3D pairs. Moreover, our
single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE
mm on cross-dataset evaluation, which even outperforms learning-based
methods trained on 3DPW
Early warning analysis of mountain flood disaster based on Copula function risk combination
Mountain torrent disaster prevention is the focus of flood control and disaster reduction in China. Critical rainfall is an important indicator to determine the success or failure of mountain torrent disaster early warning. In this paper, the M-Copula function is introduced, the multi-dimensional joint distribution of critical rainfall is constructed, and the joint distribution of rainfall and peak rainfall intensity is analyzed. Taking A village in Xinxian County as an example. The critical rainfall of the combined probability is calculated, and the critical rainfall of the flash flood disaster water level, the pre-shift warning and the sharp-shift warning is warned and analyzed. The results show that the flood peak modulus calculated by Yishangfan group is 8.89, which has certain rules for the flood peak modulus of rivers in hilly areas. The larger the basin area is, the smaller the flood peak modulus is, the smaller the area is, and the larger the flood peak modulus is. The calculation result of the design flow of 533 m3/s is reasonable. It is reasonable and reliable to select the M-Copula function as the connection function to fit the joint distribution of rainfall and peak rainfall intensity, which can provide theoretical support for flash flood disaster warning in other regions
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
Existing 3D human pose estimators face challenges in adapting to new datasets
due to the lack of 2D-3D pose pairs in training sets. To overcome this issue,
we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis
\textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge
this data disparity gap in target domain. Typically, PoSynDA uses a
diffusion-inspired structure to simulate 3D pose distribution in the target
domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse
pose hypotheses and aligns them with the target domain. To do this, it first
utilizes target-specific source augmentation to obtain the target domain
distribution data from the source domain by decoupling the scale and position
parameters. The process is then further refined through the teacher-student
paradigm and low-rank adaptation. With extensive comparison of benchmarks such
as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance,
even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This
work paves the way for the practical application of 3D human pose estimation in
unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.Comment: Accepted to ACM Multimedia 2023; 10 pages, 4 figures, 8 tables; the
code is at https://github.com/hbing-l/PoSynD
High prevalence of vitamin D deficiency among children aged 1 month to 16 years in Hangzhou, China
<p>Abstract</p> <p>Background</p> <p>Recent studies have suggested that vitamin D deficiency in children is widespread. But the vitamin D status of Chinese children is seldom investigated. The objective of the present study was to survey the serum levels of 25-hydroxyvitamin D [25(OH)D] in more than 6,000 children aged 1 month to 16 years in Hangzhou (latitude: 30°N), the capital of Zhejiang Province, southeast China.</p> <p>Methods</p> <p>The children aged 1 month to 16 years who came to the child health care department of our hospital, the children's hospital affiliated to Zhejiang university school of medicine, for health examination were taken blood for 25(OH) D measurement. Serum 25(OH) D levels were determined by direct enzyme-linked immunosorbent assay and categorized as < 25, < 50, and < 75 nmol/L.</p> <p>Results</p> <p>A total of 6,008 children aged 1 month to 16 years participated in this cross-sectional study. All the subjects were divided into subgroups according to their age: 0-1y, 2-5y, 6-11y and 12-16y representing infancy, preschool, school age and adolescence stages respectively. The highest mean level of serum 25(OH)D was found in the 0-1y stage (99 nmol/L) and the lowest one was found in 12-16y stage (52 nmol/L). Accordingly, the prevalence of serum 25(OH)D levels of < 75 nmol/L and < 50 nmol/L were at the lowest among infants (33.6% and 5.4% respectively) and rose to the highest among adolescents (89.6% and 46.4% respectively). The mean levels of serum 25(OH)D and the prevalence of vitamin D deficiency changed according to seasons. In winter and spring, more than 50% of school age children and adolescents had a 25(OH)D level at < 50 nmol/L. If the threshold is changed to < 75 nmol/L, all of the adolescents (100%) had low 25(OH)D levels in winter and 93.7% school age children as well.</p> <p>Conclusions</p> <p>The prevalence of vitamin D deficiency and insufficiency among children in Hangzhou Zhejiang province is high, especially among children aged 6-16 years. We suggest that the recommendation for vitamin D supplementation in Chinese children should be extended to adolescence.</p